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Abstract  -  This  paper  will  report  on  ongoing  research 
and  development  to  support  automated  fusion  at  the  U.S. 
Army  Research  Laboratory  centered  on  human- collected 
information  and  processing.  The  overall  approach  for 
our  research  is  three  pronged:  exploit  soft  information 
sources  such  as  human-generated  reports  and  open 
source  information,  develop  discrete  services,  and  focus 
on  the  end-user. 

Soft  information  sources  provide  information  on 
relationships  between  individuals,  organizations, 
locations,  time,  and  events.  Extracting  this  information 
and  structuring  it  for  efficient  computer  processing  is  a 
challenge  which  must  be  overcome  to  provide  better 
analytical  support  to  the  user. 

Discrete  services  provide  for  the  development  of 
components  which  can  be  composed  into  a  system. 
Processing  services  such  as  text  extractors,  web 
crawlers,  and  graph  processors  within  a  Services- 
Oriented  Architecture  provides  modularity  and 
scalability. 

The  focus  on  the  end-user  grounds  the  research  in  the 
real  needs  of  soldiers  on  the  ground  in  the  near-term  and 
the  future. 
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Introduction 

Current  counter-insurgency  operations  reinforce  the  need 
for  efficient  processing  of  human  authored  information 
[1].  Information  gathered  from  people  is  the  basis  of 
most  intelligence  in  counter-insurgency  operations.  [1]. 
Face-to-face  interviews,  interrogation  reports,  and 
overheard  chatter  are  traditional  sources  of  information 
used  to  identify,  track,  and  develop  associations  between 
individuals,  link  them  to  events,  and  develop  lists  of  key 
individuals  in  organizations,  or  High  Value  Individuals 
(HVI).  [1]  The  research  described  in  this  paper  is  an 
attempt  to  understand  and  utilize  the  rich  content  of 
human  collected  information  for  better  situational 
awareness,  the  Joint  Directors  of  Laboratories  Level  2  in 
the  Fusion  Model  [2]. 

The  U.S.  Army  collects  hundreds  of  human¬ 
generated  reports  per  day,  shared  through  semi  stmctured 
messages.  The  content  of  these  messages,  by  and  large, 
are  unstructured.  Early  U.S.  Army  digital  devices,  the 


AN/PSG  22A  (DMD),  used  for  Field  Artillery  Calls  for 
Fire  [3],  created  and  transmitted  highly  structured 
messages.  Each  field  has  an  enumerated  type  entered 
using  a  menu.  But  even  the  DMD  provided  the 
ubiquitous  “Freetexf  ’  message  to  allow  the  user  to 
transmit  information  of  interest  outside  of  the  confines  of 
the  stmctured  messages  [3]. 

Given  the  availability  and  nature  of  human  generated 
information,  the  issue  of  processing  this  data  becomes 
important.  What  processing  framework  is  required?  How 
can  one  represent  the  data  for  computer  reasoning?  How 
can  one  present  the  data  to  the  user  in  the  most 
understandable  fashion?  ARL  is  researching  these  issues, 
and  is  developing  a  prototype  to  demonstrate  and  provide 
a  means  to  assess  the  capabilities  of  newly  developed 
computational  tools  as  they  become  available. 


Processing  Framework 

With  an  eye  to  the  end-user’s  needs,  let  us  postulate  the 
following  scenario.  Given  a  large  corpus  of  data 
collected  by  human  agents,  an  analyst  may  seek  the 
following  information:  what  is  the  relationship  between 
actor  A  and  event  B?  To  investigate  this  issue,  the 
analyst  would  read  multiple  reports,  create  a  link 
diagram,  and  document  the  evidence  to  support  any 
conclusions  and  recommendations.  Automation 
assistance  in  this  workflow  can  occur  at  multiple  points. 
First  is  the  gathering  of  relevant  information.  A  user 
interface  with  a  query  capability  will  suffice.  The  power 
and  flexibility  of  Google®  is  an  example  of  how  easy  it 
is  to  gather  information  with  a  few  keywords  or  phrases. 
The  data  must  be  in  a  suitable  form  for  such  a  search  to 
work  automatically,  however,  and  this  is  the  problem 
with  unstructured  information.  Past  research  in  extracting 
information  from  HUMINT  messages  demonstrates  the 
difficulty  in  extracting  information  from  unstructured 
data  [4] [5].  The  second  form  of  automation  assistance  is 
the  development  of  link  diagrams.  The  semantic  web 
technology  and  Resource  Description  Framework  (RDF) 
Ontology  Web  Language  (OWL)  can  assist  this  process 
[6].  Lastly,  the  system  can  assist  by  visualization  of  the 
results  to  the  user.  The  Services  Oriented  Architecture 
used  in  our  project  is  the  Distributed  Common  Ground 
Station-Army  (DCGS-A)  DCGS-A  Application 
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Framework  (DAF).  The  DAF  provides  an  interface  and 
functions  to  host  services  in  a  windowed  environment, 
subscribe  to  other  services,  and  publish  its  services.  The 
framework  (see  Figure  1)  consists  of  the  Multi-Function 
Workstation,  a  DAF  application  which  provides  the 
interfaces  necessary  for  service  interaction,  and  web- 
services  which  are  services  to  provide  functions  for 
computing  graph  metrics,  process  SPARQL  Queries,  and 
provide  dimension  reduction  for  understandability.  The 
web-services  are  connected  through  the  DAF  to  the  rest 
of  the  MFWS  components  such  as  the  mapping  service, 
links  visualizer,  ontology  viewer,  and  message  viewer. 
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closeness  centrality  [9]  which  are  easy  to  calculate  for 
small  graphs  but  provide  important  information. 

Currently  we  build  the  knowledge  base  manually.  A 
trained  operator  performs  manual  extraction  from 
unstructured  text,  and  inserts  the  extracted  facts  such  as 
PersonA  is  a  member  of  OrganizationA  into  the 
knowledge  base.  Automating  this  process  is  key  to 
enabling  automated  fusion,  and  is  an  on  going  research 
project.  Currently,  we  are  using  the  extracted  results 
from  the  Soft  Target  Exploitation  and  Fusion  Army 
Technology  Objective  effort  and  experimenting  to 
determine  the  best  method  to  insert  the  facts  into  the 
knowledgebase. 

Richard  Antony  [10]  has  developed  a  method  for 
describing  fusion  functions  which  provides  a  framework 
for  developing  methods  to  solve  fusion  problems.  He 
uses  eight  canonical  forms  to  describe  the  relationships 
between  three  actors:  Entity  (E),  locations  (L),  and  Time 
(E).  This  taxonomy  allows  the  categorization  of 
problems  such  as  {same  E,  near  L,  near  T},  characteristic 
of  a  radar  tracking  problem,  and  {different  E,  not  near  L, 
near  T},  characteristic  of  a  cell  phone  call  [11].  The 
Antony  Fusion  Forms  provide  a  consistent  framework  to 
develop  algorithms  based  on  context.  We  believe  this 
method  will  provide  a  robust  means  to  build  services 
which  are  useful  and  deterministic. 


Figure  1:  Technology  Framework 


Visualization 


Data  Representation 

RDF/OWL  is  a  powerful  method  to  represent  data  and 
the  relationships  between  them.  The  representation 
schema  developed  is  based  on  triples,  a  well  known 
method  that  utilizes  object-predicate-object  triples  to 
represent  nodes  and  the  links  between  them.  Part  of  our 
research  to  date  is  in  automated  development  of  the  triple 
store.  Extraction  technology  is  essential  to  this  effort. 
Presently,  we  are  working  to  automatically  load 
extracted  triples  into  our  knowledge  base  and  database 
scheme. 

Our  knowledge  base  is  based  on  an  ontology  [7]. 
Created  from  a  data  modeling  effort,  the  major  concepts 
in  our  ontology  are  individuals,  organizations,  events, 
locations,  and  time.  Given  a  knowledge  base,  the  analyst 
or  system  developer  can  formulate  inference  rules  such 
as:  If  person  A  is  related  to  person  B  and  person  C 
knows  person  B,  then  person  A  is  associated  with  person 
C  [8]. 

Knowledge  bases  are  useful  for  generating  graphs, 
since  relationships  are  explicit.  Given  these  explicit 
relationships,  it  is  trivial  to  draw  the  resulting  network, 
as  shown  in  Figure  2.  The  use  of  graphs  in  social 
network  analysis  [9]  provides  a  sound  theoretical  basis 
for  understanding  the  relationships  between  entities. 
Graphs  have  properties  such  as  degree,  betweenness,  and 


Presenting  information  to  the  user  is  an  important 
function  in  any  interactive  system.  Ease  of  use, 
flexibility,  reliability,  and  timeliness  of  results  are  some 
features  of  a  good  interface. 

High-dimensional  data,  with  attributes  in  the 
hundreds,  are  difficult  for  humans  to  understand. 
Computer  graphics  method  to  pan,  zoom,  and  rotate 
alleviate  this  some,  but  are  insufficient.  We  are  using  a 
method  originally  developed  in  social  sciences  for 
exploratory  data  analysis,  Multi-Dimensional  Scaling 
(MDS)  [12]  to  reduce  the  dimensionality  of  the  vector 
space  to  a  two-or-three  dimensional  space  which  is  easier 
to  understand. 

Although  MDS  techniques  have  been  around  for  a 
long  time,  they  have  not  been  applied  to  complex 
information  sources  such  as  HUMINT.  We  are  using 
MDS  on  our  datasets  to  determine  how  the  technique 
assists  in  discovering  HVI’s.  To  date,  we  have  created  a 
notional  database  of  individuals  with  different 
characteristics  (terrorist,  neural,  friendly,  common 
criminal)  and  analyzed  it  using  MDS.  The  technique 
correctly  clusters  individuals  who  share  some  of  his 
characteristics.  This  is  one  technique  we  are  exploring, 
and  will  develop  others  as  we  gain  experience  in  using 
them  in  a  service  oriented  architecture. 

We  are  working  with  Dr.  James  Llinas  at  the  State 
University  of  New  York,  Buffalo,  to  research  how  graph 
matching  can  be  used  to  discover  HVI’s  [13].  A  graph 
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matching  algorithm,  called  Truncated  Search  Tree 

(TruST),  uses  heuristic  search  to  mitigate  the  NP-Hard  User  Directed  Fusion 

nature  of  the  problem  [14].  TruST  finds  the  nearest 

match  to  a  given  template  graph.  Thus,  given  a  candidate 

terrorist  network  from  a  knowledgebase  query,  TruST 

can  be  used  to  compare  this  network  against  known 

networks  to  discover  a  match. 


Figure  2:  Prototype  Analyst  Workstation 


The  prototype  system  we  developed  (Figure  2)  is  a 
Proof-of-Concept  Analyst  Workstation  for  discovering 
relationships.  Composed  of  a  user-interface,  database, 
and  services,  the  system  allows  a  user  to  make  queries, 
visualize  results  using  link  diagrams,  and  view  messages 
and  ontological  form.  The  interface  addresses  the  needs 
of  an  analyst  performing  an  investigation.  Given  a  list  of 
HVI’s,  the  investigator  may  attempt  to  refine  an  earlier 
theory,  uncover  unknown  HVI’s,  or  look  for  links 
between  two  seemingly  unrelated  events.  Given  a 
suitable  knowledge  base,  this  interface  allows  the  analyst 
to  perform  these  tasks.  First,  a  query  on  the  HVI  of 
interest  yields  (HVIi)  a  link  diagram  of  other  people, 
locations,  and  events  with  which  he  is  associated. 
Computation  of  centrality  metrics  on  the  network  reveals 


that  another  entity,  probable  HVI  (pHVI)  has  a  high 
betweenness  value  between  two  cells.  A  subsequent 
query  on  pHVI  reveals  an  association  with  an  event  that 
was  not  on  the  network  for  HVIi.  The  analyst  opens  the 
notepad  feature  and  notes  the  new  link.  The  analyst  may 
also  enter  this  information  into  the  knowledgebase 
explicitly.  Using  the  MDS  service,  pHVI  clusters  not 
with  friendlies  but  with  terrorist.  The  analyst  might 
recommend  that  pHVI  be  classified  as  an  HVI. 

Summary 

We  are  developing  a  technology  to  provide  automation 
support  to  Level  2  fusion  of  HUMINT.  ARL  is 
developing  a  framework  and  services,  and  is  working 
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with  external  researchers  to  provide  automation 
assistance  to  the  analyst.  This  research  is  in  its  early 
stages,  as  soft  fusion  is  a  relatively  new  science.  The 
metrics  remain  undefined  at  this  time,  so  it  is  premature 
to  assess  its  effectiveness  quantitatively.  However,  it  is 
clear  that  users  of  soft  fusion  must  reach  the  level  of 
confidence  that  hard  fusion  has  achieved  if  it  is  to  truly 
become  a  useful  tool  in  the  intelligence  arsenal. 
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